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Abstract. We review the physical processes that are thought to produce 
anisotropy in the cosmic microwave background, focusing primarily (but 
not exclusively) on the effects of acoustic waves in the early Universe. We 
attempt throughout to supply an intuitive, physical picture of the key ideas 
and to elucidate the ways in which the predicted anisotropy depends on cos- 
mological parameters such as and h. The second half of these lectures is 
devoted to a discussion of microwave background data analysis techniques, 
with an emphasis on the analysis of the COBE DMR data. In particular, 
the Karhunen-Loeve method of data compression is described in detail. 



1. Introduction 

Since the discovery four years ago of cosmic microwave background (CMB) 
fluctuations (Smoot et al. 1992), the data from anisotropy experiments have 
improved in both quality and quantity at a very rapid pace. CMB data al- 
ready provide stringent constraints on cosmological models, and with a 
plethora of balloon-borne and ground-based experiments underway and 
two planned satellite missions, we can expect further dramatic improve- 
ment over the next decade. In fact, there is a very real possibility that we 
will accurately measure many of the most important cosmological parame- 
ters via the CMB anisotropy spectrum (Jungman et al. 1996, Kosowsky et 
al. 1996). 

In order to realize this promise, we must take great care in developing 
tools for comparing observational data with theoretical predictions. Even 
with existing data, this process is far from trivial, and with the much larger 
data sets of the near future the task will become trickier. There are at least 
two independent problems to be faced: we must be able to make accurate 
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predictions of the anisotropy spectrum for any particular theory, and we 
must develop adequate statistical techniques to facilitate the comparison 
of these predictions with observations.^ 

These lectures are concerned with these two subjects. Wc will first re- 
view the primary physical mechanisms that are thought to be responsible 
for generating CMB anisotropies. The emphasis in this half of the lectures 
will be on building an intuitive picture of the relevant physical effects. 
We will therefore give ourselves free rein to make physically motivated ap- 
proximations, rather than trying to treat the rather involved subject of 
anisotropy formation with complete precision. This section of the lectures 
will draw heavily on the work of Wayne Hu and Naoshi Sugiyama (Hu 
& Sugiyama 1994, 1995a, 1995b, 1996; Hu 1995), as weU as on a review 
article by Hu, Sugiyama, &; Silk (1996) and two previous summer-school 
proceedings on the subject (Hu 1996, Tegmark 1996c). 

The second half of these lectures is devoted to issues of statistics and 
data analysis. We will study various ways in which theoretical predictions 
of CMB anisotropy may be compared with data sets. Our primary focus 
will be on methods for analyzing the COBE DMR data, since this is the 
largest and most powerful CMB data set in existence; however, many of 
the issues that arise in analyzing the COBE data are directly relevant to 
analyses of other experiments, both present and future. For example, we 
will pay special attention to the issue of data compression; this subject was 
fairly important in analyzing the COBE data, and its importance will only 
increase as CMB data sets get larger and larger. In particular, the planned 
MAP and COBRAS/SAMBA missions will both return data sets several 
orders of magnitude larger than COBE, and their analysis will therefore 
require extensive data compression. 

These lectures are organized as follows. Section 2 provides an overview 
of the key physical processes that produce CMB anisotropy. Section 3 
discusses the primary anisotropy, including the Sachs- Wolfe effect (Sachs 
& Wolfe 1967) and anisotropies produced by acoustic oscillations of the 
photon-baryon fluid (Peebles & Yu 1970; Doroshkevich, Zel'dovich, & Sun- 
yaev 1978; Bond &; Efstathiou 1984), as well as the diffusive damping of 
fluctuations (Silk 1968). In Section 4 we discuss anisotropies produced after 
last scattering, such as the integrated Sachs- Wolfe effect (Sachs &: Wolfe 
1967, Rees & Sciama 1968), the effect of gravitational lensing (Blandford &; 
Narayan 1992, Seljak 1996b), and reionization (Sunyaev 1977, Silk 1982). 
Section 5 attempts to synthesize the main ideas of the previous sections 
and concludes the first half of these lectures. 



^Not to mention the far more difficult task of actually gathering the data! 
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The second half, which concerns issues of statistics and data analysis, 
begins with Section 6, in which we establish some basic results and notation 
having to do with Gaussian random processes on the sphere. Section 7 
presents a series of idealized thought experiments designed to introduce 
some of the key issues of CMB data analysis. This section also contains a 
digression on Bayesian and frequentist statistical techniques. In Section 8, 
we apply what we have learned to an analysis of the four- year COBE DMR 
data, and Section 9 contains some brief concluding remarks. 

2. An Overview of Anisotropy Formation 

CMB anisotropies encode large amounts of information about the Universe. 
Physical processes around the redshift of last scattering (typically z ~ 
1100) produce the 'primary anisotropy, which can be significantly altered 
by secondary processes between the last-scattering surface and the present. 
In addition, the angular scale subtended by a particular source of anisotropy 
depends on the spatial geometry as well as the distance to the last-scattering 
surface. 

With the exception of some effects at very low redshift, and ignoring 
topological defect models, calculations of CMB anisotropy are done in linear 
perturbation theory. All of the relevant quantities are small perturbations 
about a homogeneous Friedmann-Robertson- Walker solution. Nonetheless, 
making accurate numerical predictions of the CMB anisotropy in a par- 
ticular theory is a daunting numerical task. In a typical cold dark matter 
(CDM) model, the variables one must keep track of include 

• (5b = Spb/pb, the baryon density perturbation. 

• ScDM = (^pcdm/pcdM) tlie perturbation in the CDM density. 

• vb, the baryon peculiar velocity field. 

• vcDM) the CDM peculiar velocity field. 

• ^, essentially the Newtonian gravitational potential. 

• the perturbation to the spatial curvature.^ 

• fj, the photon phase-space distribution function. 

• /i^, the neutrino phase-space distribution function. 

All of these quantities depend on position x and time t, and /-^ and fi, 
are also momentum-dependent. Their evolution is governed by a nasty set 
of coupled partial differential equations. For the nonrelativistic species, we 
must keep track of the usual equations of perturbation theory, namely the 

^We will work throughout in Newtonian gauge. For our purposes and $ are the only 
important perturbations to the metric. ^' is related to the perturbation to the time-time 
component goo of the metric, and $ has to do with the perturbation to the spatial part 
Qij. For more information on gauges, see the contribution of J.-L. Sanz to this volume, 
and also Hu (1995, 1996) and references therein. 
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continuity equation, the Euler equation, and the Poisson equation. For the 
CDM, these equations look hke 

ScDM + V • VCDM = 0, (1) 

d 1 
VCDM + 2-VcDM = ^V^', (2) 

= AirCpd. (3) 

Here a is the scale factor, p is the average density, and a dot denotes a 
time derivative. All spatial derivatives are taken with respect to comoving 
coordinates. In the last equation, 6 represents the total density perturba- 
tion, although we will generally consider models that are gravitationally 
dominated by CDM, so that we can replace S with (^CDM- There arc also 
continuity and Euler equations for the baryons, the latter containing a 
pressure term. 

The relativistic species (photons and neutrinos) are not characterized 
by a simple velocity field, but by a distribution function whose evolution is 
governed by the Boltzmann equation, 

Dt dt dx^ dt dpdt dy dt ^ ' 

Here p is the magnitude of the momentum, 7* is a direction cosine of the 
momentum, and C is a collision term having to do with scattering. This 
equation applies to both and /j,, although at the epochs we are interested 
in the neutrino collision term is zero. 

In order to make accurate predictions of the CMB anisotropy in a par- 
ticular model, it is necessary to solve this system of equations numerically. 
If we work in Fourier space, we find that different fluctuation modes are 
uncoupled and the solution is therefore greatly simplified. We write 

,5(x,t) =^(5k(i)exp(zk-x), (5) 

k 

and similarly for the other quantities. [For the distribution functions, it is 
convenient to make a second expansion in Legendre polynomials Pi (k • p) .] 
The fact that different k-modes decouple makes the problem computation- 
ally tractable. Furthermore, as we shall see, the fact that wc can work with 
one mode at a time makes it easier to get a conceptual understanding of 
anisotropy formation. 

In recent years excellent codes have been developed for integrating these 
equations. [See Hu et al. (1995) and Bond (1996) for fairly recent discussions 
of the state of the art, and Seljak k. Zaldarriaga (1996) for an important 
subsequent development.] We will not discuss the details of such precise 
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calculations here; rather, we will follow a less precise but more intuitive 
picture of the formation of anisotropies, based on a series of physically 
motivated approximations. This approach makes it easier to see what the 
important physical processes arc and also gives us an understanding of how 
various features in the anisotropy spectrum depend on key cosmological 
parameters. 

We will begin by discussing the sources of primary anisotropy: the Sachs- 
Wolfe effect (Sachs & Wolfe 1967), which describes gravitational red- and 
blueshifts due to potential differences on the surface of last scattering; the 
Doppler effect due to bulk motions of the last-scattering surface (Sunyaev & 
Zel'dovich 1970); and intrinsic temperature variations from point to point 
(Silk 1967). We will then discuss some sources of secondary anisotropy, the 
most important of which is the integrated Sachs- Wolfe (ISW) effect, which 
describes energy changes in photons as they pass through time- varying po- 
tentials. [This effect was also treated by Sachs &; Wolfe (1967), as well as by 
Rees & Sciama (1968) at nearly the same time.] Other secondary sources of 
anisotropy include scattering by rcionizcd matter and gravitational lensing. 

At first, we will consider the evolution of only one Fourier mode at 
a time; however, we will eventually need to synthesize all of the different 
Fourier modes together to see what the total CMB anisotropy on the sky 
looks like. To do that, we will need to know the power spectrum of the den- 
sity perturbation. This is simply the mean-square amplitude of the various 
Fourier modes: 

P(fc) = (6) 

(As long as space is isotropic, P depends only on the magnitude of k.) The 
angle brackets here denote an ensemble average, although it is frequently 
acceptable to assume S is ergodic, in which case the angle brackets can 
equally well be regarded as a spatial average.'^ We often assume that the 
initial power spectrum is a power law in k: P{k) cx fc". As we will see below, 
the analogous quantity for describing the observed CMB anisotropy is the 
angular power spectrum: 

Ci = {\ai^\^). (7) 

Here aim is a coefficient of an expansion of a spherical harmonic expansion 
of the temperature anisotropy (spherical harmonic expansions being the 
natural analogue of Fourier expansions for data sets that live on the sphere) . 
A mode with spherical harmonic index I probes an angular scale on the 
sky of ~ In any particular cosmological model, the angular power 
spectrum Q is related linearly to the matter power spectrum P{k). The 

^Beware: When we describe AT/T as a random field on the sphere, we may not assume 
ergodicity: AT/T is never ergodic. 
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Figure 1, The angular power spectrum /(/ + l)Ci for a standard cold dark matter model. 
The parameters of this model are as follows: n = 1, h = 0.5, ilo = 1, ilnh^ = 0.013. This 
power spectrum was computed by N. Sugiyama. 

angular power spectrum for a CDM model is shown in Figure 1.^ The 
primary goal of Section 3 will be to explain the multiple peaks in this 
spectrum. 

3. Primary Anisotropics 

3.1. THE GRAVITATIONAL POTENTIAL 

We will begin by assuming that, after the end of the radiation epoch, most 
of the mass in the Universe is in the form of cold dark matter: 

^^B ^ ^^CDM- (8) 

Then the gravitational potential is completely determined by the CDM, and 

three equations (1 — 3) can be solved for ^ and S without worrying about 
what the other species are doing. Then, once we know the gravitational 
potential ^, we can solve for the evolution of the photons and baryons. 

''The prefactor 1(1 + 1) in Figure 1 (and all of the other power spectrum plots we will 
see) is traditional. In a flat cosmological model with an n = 1 power spectrum, the Sachs- 
Wolfe contribution to the power spectrum is proportional to 1/1(1 + 1). The Sachs- Wolfe 
effect dominates on large scales, explaining the flatness of Figure 1 at low I. The quantity 
1(1 + l)Ci is also approximately proportional to the total power per logarithmic interval 
in I. (To make this proportionality exact, one would use /(/ -|- i)C( instead.) 
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Equations (1—3) can be combined into a single second-order equation 
for S, 

S + 2-6- ATrGp6 = 0. (9) 
a 

At early times, when the Universe is radiation dominated, the last term in 
this equation is negligible, and the two linearly independent solutions are 
S = const, and 5 oc Int. There is therefore little growth during the radiation 
era. 

If the Universe is matter dominated (meaning that both radiation and 
curvature are negligible in the Fricdmann equation), then we have a oc 
and the solutions are 6 oc 

^2/3 

OC a and 6 oct ^. At late times, of course, the 
growing mode is the one that matters. If we plug the matter-dominated 
growing-mode solution into the Poisson equation (3), we find that ^ is 
independent of time. This is a key fact, to which we will return repeatedly. 

3.2. THE PHOTON-BARYON FLUID 

Now that we know what the gravitational potential is doing, we are ready 
to study the evolution of the photons and baryons. We do this by mak- 
ing another approximation: we assume tight coupling between photons and 
baryons. Specifically, we assume that the mean free time r between photon 
collisions is small compared to the other important time scales: 

T^H-\{ck)-\{csk)-\ (10) 

Here is the expansion time scale, {ck)~^ is the light-travel time across 
a Fourier mode, and (c^fc)"^ is the sound-travel time across a mode {cg 
being the sound speed). This is an excellent approximation right up until 
around the time of last scattering. 

In the tight-coupling approximation, frequent scattering isotropizes the 
photon distribution function f^: at any particular point, is isotropic in 
the rest frame of the baryons at that point. In fact, fy is completely char- 
acterized by the temperature distribution. Furthermore, the photon and 
baryon densities are coupled adiabatically: n-y oc ne oc T^. The behavior 
of the photon-baryon fluid is therefore characterized by a single variable: 
if we know, say, 5-B{x,t), we can determine vb, T, and We will find it 
convenient to take as our variable the fractional temperature fluctuation, 
which is simply one third of the baryon density fluctuation: 



AT 1 
G(x,0 = — (x,t) = -,5(x,t). 



(11) 



8 



With these approximations, the dynamics of the photon-baryon fluid is 
described by the single equation 



d_ 
dr] 
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{l+R)Q\+t_Q = F{rt). (12) 



This equation comes from the Euler and continuity equations for the fluid. 
We are working in units in which c = 1. For a derivation of this equation, 
see Hu (1995). In this equation, 77 is the conformal time, 

/■* dt 

and R = 'ip-^/Ap^ is essentially the baryon-to-photon energy ratio. The 
overdot denotes a derivative with respect to conformal time. This equa- 
tion is in Fourier space, so = 6k represents a single Fourier mode with 
wavenumber k.^ The right-hand side F{ri) is a gravitational driving term, 

F{V) = -y (1 + R)^-^ [(1 + R)^] . (14) 

The rest of this section will be devoted almost entirely to a discussion of 
the solution of equation (12). We begin by making some useful observations. 
First, 

where h is the Hubble parameter in units of lOOkms"^ Mpc~^, z is the 
redshift, and VI-q is the baryonic contribution to the density parameter. So 
for standard recombination at z ~ 1000 and baryon densities around the 
nucleosynthesis range, i? ~ i at the time of last scattering. 

With the approximations that we're making, there are no anisotropic 
stresses, so the two gravitational potentials are simply related to each other: 

$ = (16) 

Furthermore, we have seen that during the matter-dominated epoch, if 
linear theory is valid, * is independent of time. The gravitational driving 
term therefore simplifies to 

F(r?) = -^(l + i2)*. (17) 

^It has become standard practice in cosmology to denote functions and their Fourier 
transforms by the same symbol, relying on context to tell the difference. [For the only 
recent exception I know about, see Tegmark (1996c).] Odious as this practice is, I have 
bowed to convention in these lectures. 
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3.3. ACOUSTIC OSCILLATIONS 

To develop an intuitive feel for the solutions to equation (12), we will start 
by making some excessive and unwarranted approximations. We will then 
gradually relax those approximations to get a more accurate picture. First, 
let's assume that R and * are independent of time. Then 

(i + i?)e + ye = -y(i + i?)*. (18) 

This is the equation for a simple harmonic oscillator, with solution 

9(77) = -(1 + R)-^ + Ki cos{kcsri) + K2 sm{kcsri). (19) 

Here Ki and K2 are constants to be fixed by the initial conditions and 
Cs = (3(1 + i?))~^/^ is the sound speed. In this approximation, then, each 
Fourier mode represents an acoustic plane wave propagating at speed Cg. 

There is a simple physical picture underlying this result. The baryon- 
photon fluid wants to fall into the potential wells, but it is supported by 
radiation pressure. The balance between pressure and gravity sets up acous- 
tic oscillations. The three terms in equation (12) come from the inertia of 
the fluid, the radiation pressure, and the gravitational field. 

In fact, let's make things even simpler and set R = 0. Then 

Q[r]) = + Ki cos{kcsr]) + K2 sm{kcsr]). (20) 

In many theories, the initial perturbation is adiabatic, meaning that the 
matter and radiation fluctuations are the same at any particular point. 
With these initial conditions, 6 = at very early times, and 6(0) = — 2*/3, 
so 

e(ry) = + cos kcsf]. (21) 

Continuing to focus our attention on a single Fourier mode, let us de- 
termine what kind of anisotropy we would expect to see on the sky. As we 
have mentioned, the three sources of primary anisotropy are gravity, the 
Doppler effect, and intrinsic temperature variations, 

^ = [* + r . V + 6]^=^^^ , (22) 

where r/LS is the time of last scattering and r is a unit vector in the direction 
of observation. 

Ignoring the Doppler term for the moment, note that the other two 
terms give a pure cosine oscillation. 



* -I- e = cos kCsT], 



(23) 
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so the r.m.s. AT/T is large when kcstjLs is an integer multiple of vr. There- 
fore, if the initial conditions have a smooth power spectrum, AT/T will 
have a harmonic scries of peaks in /c-space, leading to a harmonic scries in 
the angular power spectrum of anisotropy on the sky. This is the origin of 
the so-called "Doppler peaks" in Figure 1. Ironically, the peaks have noth- 
ing to do with the Doppler effect. In fact, the peaks are caused by modes 
that have reached maxima of compression and rarefaction at the time of 
last scattering; the Doppler contribution to the anisotropy in these modes 
is zero! 

The first peak is caused by modes that have had time to oscillate through 
exactly one half of a period before last scattering; the modes that cause the 
second peak have oscillated through a full period, and so on. The physical 
scale of the first peak is therefore A ~ = CsT/ls/tt ~ 30Mpc. The 
distance to the last-scattering surface is D = rjQ — ?7ls ~ 6000 Mpc, so the 
angular scale of the first peak is A/D ~ 0?25. We will be more precise about 
the correspondence between physical scales and angular scales later. 

Earlier, we threw out the Doppler term in equation (22) for no particular 
reason. We had better put it back. Using the continuity equation (1) and 
the relation 6 = 30, we find that 

V = ^Gk. (24) 
k 

Here k is a unit vector in the direction of k and v and 5 are still in Fourier 
space. Differentiating equation (21) and using the fact that = l/\/3 for 
i? = 0, we find that Q = —-^^"if sin kcgT). Since the r.m.s. value of f • k is 
l/VS, the r.m.s. Doppler contribution to equation (22) is 

AT] i ^ . ■, 

— = -^sm kcsTj. (25) 

J Doppler 

This has the same amplitude as the {Q + ^') contribution, but is 90° out 
of phase in both time (it goes like a sine instead of a cosine) and space 
(it has an extra factor of i). This has the rather disastrous consequence of 
completely erasing the Doppler peaks: the total AT/T is the quadrature 
sum of (23) and (25): 



2 

oc sin^ kcsT) + cos^ kcgi] = 1. (26) 



The k dependence, which led to the peaks, is gone. 

The problem, of course, is that we have taken our approximations too 
far. Specifically, the culprit is the limit i? — 0. Physically, taking the limit 
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(a) Acoustic Oscillations 




Figure 2. A simple mechanical model for a single mode of acoustic oscillation of the 
photon-baryon fluid. The behavior of the fluid inside of a potential well is shown; the 
behavior atop a potential hill would be the reverse. The springs represent the restoring 
force of the photon pressure and the balls represent the effective mass of the system. 
The top panel shows the case where the baryon contribution to the effective mass can be 
neglected, and the lower panel shows the effect of including baryons. Baryons increase the 
mass of the fluid, causing a displacement of the zero point of the oscillations. In addition, 
the sound speed is lowered. This has two effects, both of which may be seen in the plots on 
the right: baryons make the oscillations proceed more slowly and also reduce the Doppler 
contribution to AT/T relative to the intrinsic and Sachs- Wolfe contributions. Reprinted 
from Hu (1996). 

i? — > means ignoring the dynamical effects of the baryons. Let us remove 
that assumption, but keep the approximation that R is time-independent. 
Then the solution for Q{rj) changes in two ways. The sound speed gets 
smaller by a factor 1/ ^/l + R, and the driving term F(r/) gets bigger by a 
factor 1 + i?. The adiabatic solution to equation (12) is now 

6(7?) = \{l + m)^ cos kcsTj - {l + R)^. (27) 

By allowing R to be nonzero, we have increased the amplitude of the 
cosine oscillations by a factor (1 + 'iR). Furthermore, there is now an offset 
in the combined Sachs- Wolfe and adiabatic contributions to l\T jT: in the 
limit i? ^ 0, we found that B -|- ^' oscillated symmetrically about zero; 
now it oscillates about —R^. Most important, a nonzero R reduces the 
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10 7 (a) fij^h^ Dependence 




10 100 103 

I 



Figure 3. Angular power spectra for CDM models with varying values of the baryon 
density Qb/i^. Reprinted from Hu (1996). 

amplitude of the Doppler contribution to the anisotropy, relative to the 
Sachs- Wolfe contribution, since v is proportional to CgQ and has gotten 
smaller. Since the cosine oscillations are now larger in amplitude than the 
sine oscillations, we do indeed expect to see a series of peaks at fccgr/LS = 
mTT. 

Why does including the dynamical effect of the baryons effect these 
changes in the solution? The essential reason is that baryons contribute 
to the effective mass of the photon-baryon fluid, but not to the pressure. 
(This is clear from looking at equation (12): the first term, representing the 
effective mass, depends on R, but the second term, representing pressure 
support, docs not.) The effect of the baryons, therefore, is to slow down 
the oscillations, and also to make the fluid fall deeper into the potential 
wells. This explains all three of the key effects we have just mentioned: the 
increased oscillation amplitude, the offset in the center of the oscillations, 
and the reduction in importance of the velocity term relative to the other 
terms. These effects are represented pictorially in Figure 2. 

Based on this analysis, we can predict that the height of the peaks 
in the CMB anisotropy spectrum should depend on the baryon density: 
the larger the baryon density, the larger R, and the greater the amplitude 
of the oscillations. Furthermore, because of the offset in the oscillations, 
we expect the odd-numbered peaks to be enhanced relative to the even- 
numbered ones. (In the language of Figure 2, the compressions produce 
larger anisotropics than the rarefactions. Of course, if we had chosen to 
draw a potential peak instead of a potential well in Figure 2, we would 
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Figure 4- Angular power spectra for CDM models with varying values of h. All of 
these models have Qq ~ 1- For lower values of fio/i^, matter domination occurs later. 
The driving effect of the decay in the gravitational potential is therefore more significant, 
increasing the peak height. Reprinted from Hu (1996). 



make precisely the opposite statement.) 

Both of these effects are found in detailed calculations and can be seen 
in Figure 3. 

We can make further refinements to these approximations without too 
much difficulty. For instance, we can allow R to vary with time. The time 
scale on which R varies is of order a Hubble time and is much longer than 
the period of the acoustic waves. We can therefore treat the variation of R 
(and the concomitant variation in Cg) in the WKB approximation. There 
are two main results. First, the phase of the oscillation changes from kcgij 
to k J Cs drj. Second, the amplitude of the oscillations grows with time in 
proportion to c]^'^, or (1 + 

3.4. DRIVING 

We can also relax the approximation that F(rj) is constant in time. This 
has interesting consequences. A constant term on the right-hand side of an 
oscillator equation merely offsets the center of the oscillations; in contrast, a 
time-varying term genuinely drives oscillations. In particular, if the driving 

®The easiest way to see this is to note that mojA'^ is an adiabatic invariant for a 
harmonic oscillator. Here m is the mass, A is the amplitude, and ui is the frequency. Of 
course, the result can also be derived directly from the WKB approximation. 
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(b) Isocurvature 




Figure 5. Driving effects on the acoustic oscillations, (a) In the adiabatic case, the 
gradual decay in the potential causes a relatively small increase in the amplitude of 
the oscillations, (b) For isocurvature initial conditions, the initial perturbation + ^ is 
zero, and the growth (and subsequent decay) of ^ is entirely responsible for driving the 
oscillations. Reprinted from Hu, Sugiyama, & Silk (1996). 

term varies significantly on a time scale comparable to the period of the 
oscillations, resonant driving can occur. 

We have seen that ^ (and hence F) is constant during matter domina- 
tion, but it decays during the radiation epoch. For modes that enter the 
horizon before matter domination, ^ decays while that mode is undergo- 
ing its oscillations. The decay in ^ therefore boosts the amplitude of those 
short-wavelength modes. The modes that receive the largest boost are those 
that entered the horizon before matter-radiation equality at a redshift 

Zeq = 240000o/i^ (28) 

These modes are characterized by wavenumbers 

k>keq = {UMpcy^noh"^. (29) 

The effect of the driving term becomes evident if we look at power 
spectra for critical-density models with different values of the Hubble pa- 
rameter: for low h, matter domination occurs later and the boosting effect 
is greater. This effect is shown in Figure 4. 
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Figure 6. The time evolution of a single Fourier mode, a is the scale factor, normalized 
to unity today, a* is the scale factor at recombination. At early times potential decay 
increases the amplitude of the oscillations. The heights of the positive and negative peaks 
are offset by —R^ with respect to each other. The decline in amplitude at late times is 
due to diffusion damping. Reprinted from Hu (1996). 

We have been focusing on models with adiabatic initial conditions. If we 
instead consider isocurvature models, the effect of the driving term becomes 
even more evident. In isocurvature models, the total density perturbation 
vanishes at early times: 

Motal = SPB + 5p-i + 5/9CDM + . . . = 0. (30) 

Clearly 0(0) = in these models. As time passes, rcdshifts away, 
leaving genuine density perturbations and hence nonzero potentials $ and 
^. Oscillations are therefore driven in 0. In contrast to the adiabatic 
case, these isocurvature oscillations are proportional to svukcsi] rather than 
cos kCsTj. The peaks in an isocurvature spectrum arc therefore different in 
phase from adiabatic peaks. The peak locations in the CMB anisotropy 
spectrum can distinguish quite robustly between adiabatic and isocurva- 
ture models. Figure 5 illustrates the origin of the peaks in isocurvature 
models. 

3.5. DAMPING 

We have been assuming so far that the tight-coupling approximation holds 
perfectly right up until the moment t^lSj and that the photons are instan- 
taneously released at that moment. In fact, the failure of the tight-coupling 
approximation, especially around the time of last scattering, causes sig- 
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nificant damping of fluctuations as photons diffuse out of hot, overdense 
regions. Furthermore, the last-scattering "surface" is really a shell of some 
thickness. Oscillations on scales smaller than this thickness do not show up 
as observable anisotropics on the sky, since any particular line of sight will 
look at multiple peaks and troughs of that mode. 

To get a rough estimate of the importance of diffusion damping (also 
known as Silk damping), consider a photon undergoing a random walk 
through the photon-baryon fluid. If the mean free path is A, then at a 
time r], a typical photon has scattered about N ^ r}/\ times and has 
diffused through a distance Ad ~ \/N\ ~ y/^- If a particular Fourier 
mode has a wavelength less than this diffusion length, then the photons 
will have diffused from overdense to underdense regions, and the mode will 
be damped away. Diffusion damping thus occurs for modes with ^ Ad. 
Most of the damping occurs around the time of last scattering, since that 
is when the mean free path A becomes large. 

In Figure 6 we show the time evolution of a particular mode, including 
the damping at the end, and in Figure 10 below we show the net effect of 
diffusion damping on a CMB power spectrum. 

3.6. PROJECTION 

In order to complete the story of primary anisotropics, we need to specify 
precisely how a particular plane wave is projected onto a specific angular 
scale on the sky. It is clear that a mode with wavelength A will show up on an 
angular scale 9 ~ A/i?, where R is the distance to the last-scattering surface, 
or in other words, a mode with wavenumber k shows up at multipoles I RS k. 
Consequently, tilting the spectral index n of the primordial matter power 
spectrum essentially just tilts the angular power spectrum. Let us now make 
this rough observation mathematically precise. 

If we are looking in a direction r in the sky, then (ignoring the thickness 
of the last-scattering surface), the anisotropy we see is simply AT/T{r) = 
0(*°*)(i2r), where 0(*°*) includes all three terms in equation (22). For a 
single Fourier mode, this is simply 



To quantify the amount of power this produces on different angular scales, 
we expand in spherical harmonics Y;^. The relevant identity is (Jackson 




exp(zk • i-R). 



(31) 



1975) 



eMikRk ■ f) = 47r^i'ji{kR)YCMYlm{r). 



(32) 



l,m 
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Figure 7. The quantity {21 + l)jf{kR) is plotted for I = 30, I = 60, and / = 90. This 
quantity determines how much power a Fourier mode with wavenumber k contributes to 
multipole I. Note that, while most of the power is deposited at / ~ kR, there is significant 
"bleeding" to lower I. 



Combining equations (31) and (32), we find that 

AT 

— {r) = J2aimYim{i), (33) 

l,m 

where 

aim = 47Te^^°'h'ji{kR)Yi*^ik). (34) 
The total power produced by this mode in the multipole I is 

I 2 

a]= h„,P = 47r(2/ + l)|ef°*)| jf{kR). (35) 

m=—l 

The spherical Bessel function ji{x) peaks at x ~ Z, so a single Fourier mode 
k does indeed contribute most of its power around multipole Ik = kR, 
as expected. However, as Figure 7 shows, ji does have significant power 
beyond the first peak, meaning that the power contributed by a Fourier 
mode "bleeds" to Z- values lower than Ik- This is due to the fact that a 
mode appears to have a longer wavelength when looked at along a line of 
sight nearly perpendicular to the wavevector. 
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Figure 8. f2o-dcpcndcncc of the angular power spectrum. In open models, the angu- 
lar-diameter distance to the last-scattering surface is large, so the features in the power 
spectrum are shifted to small angular scales. In a flat model with a cosmological con- 
stant, the distance to the last-scattering surface is larger than in an f2o = 1 model, but 
the size of the sound horizon also increases, producing little net effect on the location 
of the peaks. The structure at low / in the low-density models is due to the integrated 
Sachs- Wolfe effect. Reprinted from Hu & White (1996). 

These formulae assume that the Universe is spatially flat. If there is cur- 
vature, then the correspondence between physical scales at last scattering 
and angular scales on the sky changes. In an open Universe, for example, 
geodesies focus in such a way that a particular angular scale corresponds 
to a much larger physical scale on the last-scattering surface. A particular 
Fourier mode in an open Universe projects to multipoles I ^ kRA, where 
Ra is the angular- diameter distance to the last-scattering surface, given by 



Here K is the curvature. When \K\ is small, Ra R, but for large \K\ 
(low ^o), Ra grows exponentially with metric distance. 

This projection effect is easy to see in predictions of the CMB anisotropy. 
In an open Universe, features such as the acoustic peaks and the damping 
scale are shifted towards smaller angular scales, i.e., towards higher I. (See 
Figure 8.) 

Note that the approximate linear relation between I and k holds only for 
primary anisotropics. The secondary anisotropies, which we discuss below, 




(36) 
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tend to occur at a wide range of distances (in contrast to the relatively thin 
last-scattering surface). Thus for secondary anisotropies, each A;-mode can 
contribute to a wide range of Vs. 

4. Secondary Anisotropies 

After last scattering, the photons and baryons are no longer tightly coupled. 
In fact, if the effects of reionization are negligible, there is no coupling at all. 
In this case, the photons simply propagate freely along spacetime geodesies 
from last scattering to the observer. The causes of secondary anisotropy are 
then entirely gravitational, the dominant effect being the ISW effect. Weak 
gravitational lensing can also distort the anisotropy spectrum, although 
this effect is generally small. 

If the intergalactic medium reionized at a sufficiently early redshift, then 
some fraction of the photons will interact again after the time of "last" 
scattering. The main result is that primary fluctuations are erased, and in 
addition new fluctuations can be generated from the new last-scattering 
surface. However, the last-scattering surface in a reionized model is ex- 
tremely thick (since the photon-baryon coupling is weak), so the nature of 
the regenerated anisotropy is quite different from the primary anisotropy. 

4.1. INTEGRATED SACHS- WOLFE EFFECT 

As Sachs & Wolfe (1967) showed, fluctuations in the spacetime curvature 
produce CMB anisotropy in two distinct ways. The "ordinary" Sachs- Wolfe 
effect is simply the gravitational red- or blueshift due to the potential differ- 
ence between the points of emission and reception of a photon. In addition, 
if the gravitational potential changes with time, there is an "integrated" 
Sachs- Wolfe effect. 

Imagine a photon falling into a potential well, and then climbing out 
the other side. If the potential does not vary with time, the photon suffers 
no net change in energy. However, if the potential well decays while the 
photon is passing through it, then the redshift upon climbing out of the 
well is smaller than the blueshift upon falling in. The photon therefore 
gains energy. The magnitude of the ISW effect is given by an integral along 
the photon's path: 

(^^) = J (*(x, n) - $(x, n)) dij. (37) 

We observed earlier that the gravitational potential is time-independent 
if certain conditions are satisfied: 

• The Universe is matter-dominated (pmatter ^ Prad)- 
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• Spatial curvature is negligible (0 = 1). 

• Linear perturbation theory is valid ((5 <C 1). 

If all of these conditions are satisfied, there is no ISW effect. However, in 
any realistic cosmological model some or all of these conditions are violated 
at some point. 

4.2. EARLY ISW EFFECT 

In a typical model, the epoch of matter-radiation equality occurs before the 
time of last scattering, but not long before. The matter-dominated limit is 
therefore not quite correct around the time of last scattering and shortly 
thereafter. The decay in the potential shortly after last scattering gives rise 
to the early ISW effect. This effect is largest when the matter density Cloh^ 
is low. 

The early ISW effect is most important on large scales. Specifically, the 
scales that are most affected are those with comparable to the time 
scale on which the potential decays. Modes with wavelengths much shorter 
than this oscillate many times while the potential is decaying, causing both 
positive and negative ISW contributions, which tend to cancel each other 
out. The time scale for potential decay is of order the horizon size at last 
scattering, so the early ISW effect shows up on large angular scales I ^ 200. 

4.3. LATE ISW EFFECT 

In models with 7^ 1, the potential decays at late times, typically at 
redshifts z ^ $7^^. This potential decay, which occurs whether or not there 
is a cosmological constant, gives rise to an ISW effect at late times. As with 
the early ISW effect, modes with wavelengths comparable to the time scale 
for the potential to decay are most affected. The relevant time scale is the 
horizon size at the time of potential decay, so the late ISW effect also leaves 
its imprint on large angular scales. 

4.4. OTHER ISW EFFECTS 

At very late times, nonlinear structure forms, causing the potential to grow 
with time. The ISW effect due to nonlinear structure is often called the 
Rees-Sciama effect (Rees & Sciama 1968). In standard models, the Rees- 
Sciama effect is typically much weaker than the other effects we have dis- 
cussed (Seljak 1996a). 

A background of primordial gravity waves, if there is one, produces its 
own ISW effect. Gravity waves redshift once they enter the horizon, so 
modes that enter the horizon well before last scattering leave no imprint on 
the CMB. The gravity-wave contribution to the CMB anisotropy therefore 
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occurs on large angular scales I ^ 100. Because of the quadrupolar nature 
of the spacetime distortion caused by a gravity wave, the gravity- wave 
contribution to the CMB quadrupolc is enhanced relative to other modes. 

There may be other sources of spacetime distortion besides linear den- 
sity fluctuations and gravity waves. In particular, topological defects cause 
spacetime curvature and hence an ISW effect. We will not discuss topolog- 
ical defects further; for more information, see Paul Shellard's contribution 
to this volume, and the references therein. 

4.5. GRAVITATIONAL LENSING 

The ISW effect may be thought of as gravity imparting a "kick" to a photon 
forward or backward along the direction of motion. Gravity can also kick 
the photons in the transverse directions, changing their directions of motion 
but not their energies. The result of this weak gravitational lensing is that 
our image of the last-scattering surface is slightly distorted, as if we were 
looking at it through an irregular refracting medium. This distortion of 
the last-scattering surface results in a slight smearing of the angular power 
spectrum, with power from the peaks being moved into the valleys. The 
effect is typically weak, resulting in changes at the few-percent level in the 
power spectrum (Seljak 1996b). 

4.6. REIONIZATION 

We will not undertake a detailed discussion of reionized models here. In- 
stead, we refer the interested reader to Roman Juszkiewicz's contribution 
to this volume and references therein. We will, however, make some general 
comments. 

The Gunn-Peterson test (Gunn Sz Peterson 1965) tells us that the in- 
tergalactic medium is ionized out to redshifts of a few. In CDM-like models 
of structure formation, reionization is generally thought to occur at such 
moderate redshifts, with the formation of the earliest nonlinear structures. 
If this is correct, then reionization does not dramatically alter the CMB 
anisotropy predictions. If, on the other hand, reionization somehow hap- 
pened earlier, say at z ^ 100, then a significant fraction of the CMB pho- 
tons have been scattered by the reionized matter after the so-called epoch 
of last scattering. 

The main effect of such early reionization is to erase anisotropy on de- 
gree scales. The reason is quite simple: if we have early reionization, then 
a photon that comes toward us from a particular direction need not have 
originated from that direction. Rather, as Figure 9 illustrates, each direc- 
tion on the sky contains photons that originate from a variety of different 
locations at the time of "last" scattering. In severely reionized models, the 
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Figure 9. Our backward light cone. The vertical axis represents conformal time, and the 
horizontal axes are two of the three spatial directions. In the absence of reionization, each 
line of sight corresponds to a particular point on the last-scattering surface at 2: ~ 1000. In 
a reionized model in which a typical photon last scattered at z = 10, a photon arriving 
from a particular direction may have originated from any point in the shaded circle. 
Reprinted from Tegmark (1996c). 

peaks are completely washed away. Such models may already be ruled out 
by degree-scale CMB experiments (Scott, Silk, & White 1995). 

Inhomogeneities and bulk motions of the reionized matter induce new 
CMB anisotropics, which must generally be treated to second order in per- 
turbation theory (Ostriker & Vishniac 1986; Hu, Scott, & Silk 1994; Dodel- 
son & Jubas 1995), but we will not discuss these regenerated anisotropics 
here. We also neglect to discuss the effect of nonuniform or patchy reioniza- 
tion, including the Sunyaev-Zel'dovich effect (Sunyaev & Zel'dovich 1970). 

5. Summary of Anisotropy Formation 

We have now concluded our tour of the mechanisms of anisotropy forma- 
tion. Figure 10 illustrates some of the key points. The dominant features 
in a typical CDM power spectrum are the peaks due to acoustic oscilla- 
tions of the photon-baryon fluid. The peaks correspond to modes that are 
undergoing maximum compression and rarefaction at the time of last scat- 
tering. Modes that are out of phase with these modes produce anisotropy 
via the Doppler effect, partially filling in the valleys between the acoustic 
peaks. The effect of damping on small scales is evident, and the rise at 
I ^ 500 in the undamped spectrum shows the driving effect of the decaying 
gravitational potential at early times. 

We can use what we have learned to determine how the predicted 
anisotropy spectrum should depend on the key cosmological parameters: 

• In models with spatial curvature (TIq 7^ 1), the position of the acoustic 
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Figure 10. Analytic decomposition of anisotropics. The solid line shows the angular 
power spectrum of a critical-density CDM model. The upper dashed curve shows the spec- 
trum that would be seen in the absence of diffusion damping. Note that the undamped 
peak heights increase at scales small enough to have crossed the horizon before mat- 
ter-radiation equality. (See Section 3.4.) The other curves show the relative importance 
of the Sachs- Wolfe, integrated Sachs- Wolfe, and Doppler (0i) contributions. Reprinted 
from Hu (1996). 

peaks shifts due to geodesic deviation. In addition, the late ISW effect 
boosts the large-scale power. 

• If there is a cosm.ological constant, then the position of the peaks shifts 
slightly due to the increased distance to the last-scattering surface, and 
again, the late ISW effect boosts the large-scale power. 

• Lowering the Hubble parameter (for fixed Qq) reduces the matter den- 
sity. The gravitational driving of oscillations is enhanced, and the peaks 
increase in height. 

• The higher the baryon density ^uh^, the greater the peak amplitude. 
Odd-numbered peaks in particular are enhanced. 

• The spectral index n of the primordial power spectrum essentially just 
tilts the angular power spectrum. 

• If we add gravity waves to a model, we increase the quadrupole, and 
in addition the whole "plateau" at low I rises relative to the acoustic 
peaks. 

Although we have made many approximations in deriving these con- 
clusions, all of them are borne out by detailed Boltzmann calculations. 
Because the CMB anisotropy predictions depend sensitively on the various 
parameters, an experiment that could map out the acoustic peaks would 
be able to measure these cosmological parameters accurately. The spatial 
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curvature in particular should be relatively easy to pick out, thanks to the 
shift in position of the first peak. The relative positions of successive peaks 
also provide a robust way of determining whether the initial conditions are 
isocurvature or adiabaticJ The dependence of the power spectrum on other 
parameters such as h and J7b is somewhat more subtle, but if we manage to 
detect and measure the heights of two or three peaks, we should be able to 
do quite well (Jungman et al. 1996), assuming, of course, that the general 
paradigm sketched above is correct and the multiple peaks are really there. 

6. Statistical Properties of AT/T 

Before we discuss methods for comparing theories with data, we need to 
discuss briefly the statistical properties of the CMB anisotropy as it appears 
on the sky. As we have mentioned, it is convenient to expand the observed 
anisotropy in spherical harmonics: 

AT 

— {v) = Y,aimYUi). (38) 

l,m 

We have focused on the anisotropy produced by an individual plane wave; 
the observed anisotropy is of course a superposition of contributions from 
all of these plane waves: 

«/- = E«S- (39) 

k 

Since all of the relevant physics is described by linear perturbation theory 
— as we know, everything in nature is linear^ — each a|^^ is proportional 
to the initial density perturbation (5^""*'' . 

One often assumes that the initial conditions have "random phases," 
meaning that different Fourier modes are uncorrelated, 

(5k(5k') = when k ^ k'. (40) 

In this case, the are also uncorrelated. The mean-square power in a 
particular multipole is then simply the sum of the contributions from the 
various Fourier modes: 

{\aim?)=T.^\at2?). (41) 

k 

'^For isocurvature initial conditions, the second and third peaks occur at three and five 
times the location of the first; in adiabatic models, they occur at two and three times. 
®to first order. 
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And, of course, the left-hand side of this equation is simply the angular 
power spectrum Ci. (This quantity is independent of the azimuthal index 
m as long as space is isotropic.) 

We often go beyond the assumption of random phases and assume Gaus- 
sian initial conditions. This is a prediction of inflationary scenarios, but 
one often assumes Gaussian initial conditions even in non-inflationary phe- 
nomenological models such as isocurvaturc baryon models (Peebles 1987). 
When we talk about a Gaussian theory, we simply mean that at some initial 
time ti the density perturbation 5 was a realization of a Gaussian random 
field. Bernard Jones has provided a detailed discussion of Gaussian random 
processes elsewhere in this volume; for our purposes, all we need to know is 
that the assumption of Gaussian initial conditions, together with homogene- 
ity and isotropy, implies that each Fourier coefficient is an independent 
Gaussian random variable of zero mean.^ In other words, the real-space 
density perturbation (5(x) is a stochastic superposition of plane waves of 
all different wavelengths. Since a Gaussian random variable is completely 
determined by its mean and variance, and since {5\^) = 0, the statistical 
properties of our Gaussian random field are completely determined by the 
power spectrum P{k) = (l^kP)- 

If we assume Gaussian initial conditions, then each coefficient aim is a 
Gaussian random variable, since it is a linear combination of the Gaussian 
variables 5i^. The statistical properties of AT/T are therefore completely 
specified by the means, 

{aim) = 0, (42) 

and the covariances, 

{aimai'm') = Ci5iii5mm'i (43) 

of the coefficients aim- In other words, for Gaussian initial conditions, the 

angular power spectrum Ci tells us everything we need to know. 

Even when the initial conditions are not Gaussian, it often suffices to 
treat the CMB anisotropy as Gaussian, at least on sufficiently large angular 
scales. The CMB fluctuation on large angular scales is typically due to 
a superposition of many incoherent fluctuations. Even if the individual 
fluctuations fail to be Gaussian, the central limit theorem guarantees that 
the superposition will be approximately Gaussian. When comparing the 
COBE data with the predictions of a cosmic string model, for example, it is 
perfectly adequate to treat AT/T as Gaussian, even though the underlying 
perturbations are highly non-Gaussian. 

®That is, Gaussian initial conditions (together with homogeneity and isotropy) imply 
random phases, but not conversely. 
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7. An Introduction to CMB Data Analysis 

7.1. AN IDEALIZED EXPERIMENT 

We will explore the key issues in CMB data analysis by first considering an 
absurdly idealized experiment (the sort of thing only a theorist could dream 
up). We will gradually introduce real- world complications to see what the 
main issues are. 

Imagine, then, an experiment that measured AT/T at many pixels that 
cover the entire sky completely and uniformly. Furthermore, imagine that 
each data point is a perfect, noise-free measurement. With this data set, 
we could determine each coefficient aim with essentially perfect accuracy 
by inverting equation (38): 

aim = [ ^mu^) dn^^Y. ^iyv)YL{iv)- (44) 

Here d£t is an element of solid angle in the direction of f , ATpjx is the total 
number of pixels, and fp is a unit vector in the direction of the pth pixel. 

Even in this hopelessly idealized experiment, we still can't measure the 
angular power spectrum C; perfectly. The reason is that Q is an ensemble- 
average quantity: it is the variance of the distribution from which is 
drawn. We have only a finite number, 21+ 1, samples of this distribution at 
each I. This fact, generally called cosmic variance, sets a fundamental limit 
on how well we can ever hope to measure the angular power spectrum. 

If we assume Gaussian statistics, then the best estimator of C; is simply 
the average of |a/^p over m: 

m=—l 

This quantity is chi-squared distributed with 2/ -|- 1 degrees of freedom, and 
so it has a fractional uncertainty of 



VarV2(Q 



Ci y 21 + 1 



(46) 



The unfortunate fact, therefore, is that even in a perfect experiment we will 
never know Q with a fractional uncertainty better than (I + 5)"^^^- We 
are stuck with a 63% uncertainty in the quadrupole power C2 and a 30% 

^"Cosmic variance is closely related to the failure of ergodicity. If AT/T were ergodic, 
then the average value of |a;m|^, measured in different orientations over the sphere, would 
be the ensemble-average quantity C; . But AT/T isn't ergodic, so this doesn't work. 
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uncertainty in Cio, although we can in principle hope to determine Ciooo 
to 0.3%. 

7.2. NOISE 

Let's mess up our nice, clean experiment by adding noise. Each pixel is no 
longer a perfect measurement of AT/T: the ith data point di consists of a 
sum of signal and noise, 

AT 

di = —{ri)+ni. (47) 

Let us assume that the noise ra, in each pixel is independent and Gaus- 
sian distributed, with some standard deviation a. For the moment we will 
assume homoskedasticity, that is, that a is the same in all pixels. 
We can still try to estimate aim using equation (44), 

N ■ 

dim = dpYfmirp), (48) 

and average over m to get an estimate of Ci, 

Cl = ^ E \dim\\ (49) 

m=—l 

but this quantity will no longer be a good estimate of the true Q; it will be 
biased upward. Using equations (48) and (47), together with the fact that 
{upUpi) = a^Sppi, it is straightforward to check that 

4.77- 

{\dimf) = Ci + -—a''. (50) 

»pix 

The estimator Ci is the average of these quantities, so it too is biased 
upward by Ana"^ / Npi^. 

We can of course get a better estimate of Ci by subtracting off the noise 
bias, 

A' ^Q- -^a\ (51) 

'pix 

We now have an unbiased estimator, but unfortunately the uncertainty of 
C'l has increased: 
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7.3. A DIGRESSION ON STATISTICAL METHODS IN GENERAL 

The problem we just considered was a classic example of statistical param- 
eter estimation. We had some data, {d}, from which we wanted to estimate 
a parameter, Ci. We did it by choosing an estimator, C'l, which we could 
compute from the data, and which we hoped would be close to the true 
value of the parameter. 

In the problem above, there was a fairly natural choice of an estimator, 
but in general, for a more complicated problem, there may be no obvious 
choice. There is no universal, "correct" way to choose an estimator, but 
in many situations the maximum-likelihood estimator is a good choice. We 
will illustrate maximum-likelihood estimators with a simple example. 

Suppose that we have M data points Xj, each of which is the sum of a 
signal Si and some noise nj. We will take both Sj and rii to be Gaussian 
random variables with zero mean. The variances of the signal and noise are 

{sh = S, {nj) = N, (53) 

And everything is uncorrelated: 

{siSj) = {niUj) = {siUj) = 0, (54) 

where the first two expressions assume i j. Let us suppose we know the 
noise variance N, and wc want to estimate the unknown quantity S, using 
a maximum-likelihood estimator. 

The first step is to compute the probability density of the data for fixed 
S. We want to know p{{x} | S), where p{{x} \ S)d^x is the probability 
of getting a set of data that lie within an infinitesimal volume d^x at the 
location of the actual data {x}. In this case, each Xi is an independent 
Gaussian with variance S + N, 

^^"^^ ' = sj27:{S + N) ^""P ("^^'/^^^ + ' 
and the joint probability density is the product 

M 

P{{x} I S) = \{p{xi I s) (56) 

i=l 

= iMS + N))-''/'e^v{0f^Y (57) 

^^The astute reader will have noticed that this is precisely the same problem we con- 
sidered at the end of the last subsection. We have simply changed all of the notation for 
no good reason. To be specific, the correspondence with the previous problem goes like 
this: M — »■ 2Z -h 1, 5 — > Ci, Si — > aim, ix ■ 
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The probability density we have computed is a function of the data {x} 
for fixed S. But the data are known, and S is what we want to know. We 
therefore choose to regard this probability density as a function of S and 
call it the likelihood. 

LiS)=p{{x} I S). (58) 

When working with Gaussian probability distributions, it is often con- 
venient to work with the quantity JC = — 21nL instead. The maximum- 
likelihood estimator, as its name suggests, is the value S of S for which 
L is maximized (or C is minimized). In other words, it is the value of the 
parameter for which it would have been most likely for us to get the data 
we actually did. 

In the problem at hand, the maximum-likelihood estimator is found by 
differentiating 

£ = Mln27r + Mln(S + iV) + -^^ (59) 

^ ^ S+N ^ ' 

with respect to 5, setting the result equal to zero, and solving for S. The 
result is 

, M 

^=mE^?-^- (60) 

i=l 

That is, we compute the mean-square value of the data points and subtract 
of the noise bias. This is precisely what we did when we computed C[ in 
equation (51). Although we didn't know it at the time, we were using a 
maximum- likelihood estimator. 

In this case, the maximum-likelihood estimator turned out to be unbi- 
ased: its ensemble average (S*) is equal to the correct value S. In general, 
there is no guarantee that this will happen. To take a simple example, sup- 
pose that we had chosen to estimate the quantity S"^^^ instead of S. The 
maximum-likelihood estimator would be /S^®^, and it is easy to see that this 
quantity is highly positively biased. 

Now we know how to estimate parameters. But in most cases an es- 
timator isn't much good without a way of quantifying the uncertainty in 
it. Methods for doing this generally fall into two categories: the classical 
or frequentist approach {e.g., Rice 1995) and the Bayesian approach (e.g., 
Berger 1985, Gull & Daniell 1978, Press 1996). We will discuss each in turn. 

In the frequentist picture, we look at one value of the parameter S at 
a time, and try to determine if that value is so far from our estimator S 
that it is ruled out. Specifically, for each S, we compute the probability 
distribution of the estimator S. We use this probability distribution to 
determine how likely it is that we would have gotten a value of S as far 
off as we did, or worse. If the actual value of S is far off in the tail of the 
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probability distribution, then this probabiUty wih be low. If the probability 
lies below some significance level (say 5%), we say that that value of S 
is ruled out with 5% significance.-^-^ We repeat this process for a range of 
values of S, and we say that the set of values that are not ruled out form 
a 95% confidence interval for the parameter. 

For a frequentist, a value of the parameter is ruled out if there is a 
low probability of getting data that fits as badly as the actual data. The 
Bayesian approach is quite different in spirit: A Bayesian attempts to deter- 
mine the subjective probability distribution that characterizes her knowl- 
edge of the parameter given the data. Armed with that probability distri- 
bution, she can calculate how likely the parameter is to lie in any particular 
range. 

In order to implement the Bayesian strategy, we want to turn the like- 
lihood function L{S) = p{{x} \ S), which represents the probability of the 
data given a value of the parameter, into p{S | {x}), the probability of the 
parameter given the data. The way to do this is to invoke Bayes's theorem: 



with the constant of proportionality chosen to make the integral of the left- 
hand side equal one. The left-hand side of this equation is the posterior 
probability distribution, and it is precisely what we are looking for: it tells 
us the probability of a particular parameter value, given the data. On the 
right-hand side we have the product of the likelihood function and the 
prior distribution of the parameter S. The latter represents our state of 
knowledge of S before we looked at the data. 

A Bayesian characterizes the uncertainty in a parameter estimate by 
drawing a credible region around the estimate. A 95% credible region, for 
example, is an interval Suuu < S < S^mx such that there is a 95% posterior 
probability that S lies in that interval. 



The boundaries Si^iu and 5'max of the credible region are typically chosen 
to have equal values of the posterior probability density. 

Although the frequentist approach is the one most people think of when 
they think of statistics, and although most scientists profess to prefer it, 
many if not most error bars in cosmology are determined using Bayesian 
techniques. 

^^Astrophysicists often phrase that same statement differently, saying that the value 
is ruled out at 95% confidence. 



p{S I {x}) (xp{{x} I S)p{S), 



(61) 




(62) 
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The main objection people raise to the Bayesian is that the final results 
depend on the prior distribution p{S). For a true, orthodox Bayesian, this 
is not really a problem: the Bayesian view is that all probabilities represent 
our subjective knowledge, and that prior distributions are therefore secretly 
built into all statistical reasoning. It is better, the argument goes, to have 
the prior out in the open for all to see. 

Whether or not you like this argument, there is no denying that in prac- 
tice choosing a prior can be tricky. If one has essentially no prior knowledge 
about the parameter, then the prior distribution should be broad and flat. 
[For a flat prior, wc can see from cqiiation (61) that the posterior probabil- 
ity distribution is simply the likelihood function.] But even in this situation, 
it is not generally obvious which "flat" prior to choose. For example, if we 
are trying to estimate an element of the power spectrum Q, should we 
choose a prior that is flat in Ci or one that is flat in ^/Ci? [Ci is after all 
a mean-square amplitude; maybe the r.m.s. amplitude is a more "natural" 
choice.) Perhaps we should even choose a prior that is flat in InC;, since 
such a prior avoids choosing a preferred scale. It would be hard to say that 
any of these choices is "wrong," but in some situations the result of a cal- 
culation may depend on which choice is made. For an example, see Bunn 
et al. (1994). 

The situation is not as bad as it appears, however. If the data set in 

question contain a good, strong detection of the parameter of interest, then 
the likelihood fTinction is sharply peaked, and the shape of the posterior 
probability (61) is determined mostly by the likelihood rather than the 
prior. Prior dependence is thus typically weak in the case of strong de- 
tections. The situations where prior dependence is a serious problem are 
typically those in which someone is trying to coax a value out of a data set 
that is capable of only a weak constraint anyway. 

7.4. INCOMPLETE SKY COVERAGE 

We now return to our hypothetical CMB experiment. The next complica- 
tion we need to consider has to do with the fact that no actual experiment 
ever achieves complete sky coverage. In the case of COBE, pixels close to 
the Galactic plane are contaminated, leaving only about two thirds of the 
sky usable. All other experiments to date have covered even smaller patches 
of sky. 

This fact requires us to completely change our approach. As much as 
we would like to estimate each and hence each Ci individually, in the 
absence of complete sky coverage it is impossible to do so. There is in 
fact no estimator of a particular Ci that is "uncontaminated," i.e., that is 
independent of all of the other Q/. 
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We may decide that it is important to estimate each Q individually, 
with the minimum possible contamination from other multipoles. Tegmark 
(1996a) has devised power-spectrum estimators with this property in mind 
and has applied them to both galaxy surveys (Tegmark 1995) and the four- 
year COBE data (1996b). For instance, suppose we have our hearts set on 
knowing the value of Cn as well as possible. Since the power spectrum is 
quadratic in AT/T, it is natural to choose a quadratic estimator, 

Ci7 = Y,Ai3did3-B. (63) 

Here dj is a data point and we want to choose the matrix elements Aij 

and the bias correction B in order to get as good an estimator as possible. 
Tegmark (1996a) proposes that we choose these quantities to make our es- 
timator unbiased and to minimize the dependence of Ciy on all of the other 
C/'s. He shows that it is impossible to completely remove contamination 
from other multipoles and that in general the "spectral resolution" A/ of 
an experiment is approximately the reciprocal of the angular scale A^ cov- 
ered by the sky map. In particular, for an experiment like COBE, A0 ~ 1 
radian, and it turns out that it is possible to estimate a particular Ci with 
significant contamination only from modes with Al ~ 2 (Tegmark 1996a, 
1996c). 

7.5. MAXIMUM-LIKELIHOOD PARAMETER ESTIMATION 

We may, however, decide that it isn't so important to estimate each Ci 
individually. Often, a more fruitful approach is to parameterize the power 
spectrum Ci with a small number k of parameters, 

Ci = Ci{qi,q2,...,qk), (64) 

and use maximum-likelihood methods to estimate those parameters. This 
is in fact the usual approach in CMB data analysis. Specific choices of the 

parameters {q} include the following: 

• We may assume a shape for the power spectrum and estimate the nor- 
malization. In this case, there is only one free parameter, which is con- 
ventionally taken to be the quadrupole amplitude (Q) = -y/SC^T^^-^^ 
Most degree-scale experiments are only powerful enough to determine 
a single number, the total power. One therefore frequently assumes a 

^^A bewildering variety of notations exist in the literature. We choose to call this 
quantity (Q) to emphasize the fact that it is a theoretical ensemble-average quantity. In 
particular, it is not the same as the local quadrupole Qrms = X]m=-i l'J2mP/47r. The 
COBE group generally denotes its estimators of (Q) by Qrms-ps- 



33 



"flat" power spectrum 1(1 + 1)C; = const, and estimates the normal- 
ization, which in this context is often called Qflat- 

• Both the normalization (Q) and the spectral index n may be chosen as 
free parameters. For a large-angle experiment like COBE, the predicted 
power spectrum depends only weakly on many of the other parameters. 

• White Sz Bunn (1995) have suggested a phenomenological parameter- 
ization of the power spectrum. At large angular scales, many popular 
theoretical models are well approximated by power spectra that are 
quadratics in logZ. To be specific, we may set 



l{l + l)Ci = D,{1 + D'{log,ol- 1) + ^D"{log,ol- if) (65) 



and work with a three-parameter family {Di,D', D") of power spectra. 
• We may choose to divide the power spectrum over the range probed 
by a particular experiment into a small number of "bands." We then 
estimate the power in each band, assuming that l{l + l)Ci is constant 

in each band. This has been done for COBE (Hinshaw eA, al. 1996) 
and Saskatoon (Netterfield et al. 1996), although the latter uses a 
completely different method. 

No matter what parameterization we adopt, we need a way to compute 
the likelihood L for a given power spectrum. As long as we assume Gaussian 
statistics, it is relatively easy to write down a formula for the likelihood, 
although as we shall see it can be cumbersome to compute it in practice. 

We begin by introducing some notation. Each data point di is as usual 
the sum of the signal AT/T{ri) and noise n^. Expanding AT/T in spherical 
harmonics, we have 



Let us denote a pair of indices (Im) by a single Greek index fx. The cor- 
respondence is n = 1(1 + 1) + m, so that /i ranges from 1 to oo as (Im) 
take on all of their allowed values. Then we can write equation (66) more 
compactly as 



noise vector, and the infinite-dimensional vector a = (ai, 02, . . . , a^, . . .) 
contains the spherical harmonic coefficients. The A^pix x 00-dimensional 
spherical harmonic matrix Y has elements 




(66) 



l,m 




(68) 



We denote vectors that live in abstract spaces such as "pixel space" by arrows, and 
vectors in real three-dimensional space are written in boldface. 
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The statistical properties of d are determined by the properties of a and 
n. Assuming Gaussian statistics, both are Gaussian random vectors with 
zero mean and covariances given by 

(a^a*) = Cf,6^„ = C^y, (69) 
{niUj) = ajSij = Nij, (70) 
{a^m) = 0. (71) 

(C^ = Ci where I is the index corresponding to /j,, and C and N are diagonal 
matrices.) Since d is a linear combination of a and ft, it too is a multivariate 
Gaussian, and the likelihood function therefore has the form 

HO,)^p(i\ Q) - p,)....^^t'/2M '^'"^ i-lfm-U-) . (72) 
The T denotes a transpose, and the covariance matrix M is given by 

M = {d(f) = {{Yd + n) {Yd + nf) = YCY^ + N. (73) 

In principle, we are now ready to estimate parameters. Equation (72) 
tells us how to compute the likelihood for any particular power spectrum C;, 
so all we need to do is hunt through our parameter space for the parameters 
that maximize the likelihood. 

In fact, for a typical degree-scale experiment with tens or at most hun- 
dreds of pixels, this is essentially what is done. For a large data set such as 
COBE, though, there are too many pixels for this to be convenient: each 
time we wish to compute a likelihood, we must invert the A^pix x ATpjx matrix 
M. For COBE, therefore, we must implement some form of "data compres- 
sion" to make the analysis tractable. (Data compression will be even more 
essential for a future satellite experiment with orders of magnitude more 
pixels than COBE.) 

7.6. BEAM-SMOOTHING AND CHOPPING 

Before we discuss data compression, though, we need to discuss one more 
issue. The hypothetical experiment we have been discussing is still overly 
idealized in one important way. We have assumed that the signal measured 
by the experiment is the temperature anisotropy AT/T at a point. In real- 
ity, no experiment has perfect resolution, so the observed signal is actually 
the convolution of AT/T with some beam pattern or point-spread function. 
Furthermore, many experiments chop their beams between two (or more) 
points on the sky, with the measured signal being a difference between these 
points. 
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The effect of the beam pattern on our analysis is fairly simple. Let B(a) 
represent the response of the instrument to a point an angular distance a 
from the line of sight. (Wc assume that the beam pattern is azimuthally 
symmetric.) Then what the experiment actually measures is the convolution 
of the anisotropy with the beam pattern, 

(—*B)=^airnYim. (74) 



V T 



Lm 



The coefficients aim snoe related to the true anisotropy coefficients aim like 
this:^^ 

aim = Biaim, (75) 
where Bi is the expansion in Legendre polynomials of B, 

= J d cos a B (a) Pi {cos a). (76) 

If the beam pattern happens to be a Gaussian, 

B{a) oc exp(-a'^/2a^), (77) 
then the Legendre coefficients are 

= exp(-i(72/(/ + l)). (78) 
Note that as expected Bi is very small for Z S> cr"-'^, i.e., for angular scales 

We can adapt all of the previous results of this section to take beam- 
smoothing into account by simply saying that our experiment is measuring 
the beam-smoothed power spectrum, 

Q = QBf, (79) 

instead of Q. 

We can account for the effect of beam-switching in a similar way. Con- 
sider an experiment that chops between two points with spherical coordi- 
nates [9,(1) + ^a) and {d^cj) — \a). Ignoring beam-smoothing, the observed 
signal d is the difference in the anisotropy between these two points: 

AT AT 
d = —{eA+'^a)-—{e,^-\a) (80) 

= Y.'^im(Ylm{e,<l>+\a)-Yim{e,(l)-\a)). (81) 

^^This result is simply the spherical version of the convolution theorem for Fourier 
transforms, f * g = fg. 
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The azimuthal dependence of Yi^ is cxp(im^), so 



d = "^aimYimiO, (t>) (exp(izma) - exp(-i 



)) 



(82) 



l,m 



= '^aimYi,n{0,(l))2i sin ^ma. 



(83) 



l,m 



The net result is that a/^ is replaced by 2iaim sin -^ma, so modes with low 
|m| are suppressed. Since m ranges from —I to I, this suppression affects 
primarily modes with low l.^^ 

This suppression is conventionally quantified by computing a "window 
function" that represents the sensitivity of the experiment to different mul- 
tipoles. To do this, we compute the mean-square signal, 



The window function Wi is small for low indicating that chopping has 
rendered this experiment insensitive to the largest angular scales. 

Note that we have not included beam-smoothing in equation (84). The 
correct window function, including beam-smoothing, is obtained by multi- 
plying this result by Bf. 

Equation (84) gives the window function for the particularly simple case 
of a single-difference experiment. There are more complicated switching 
strategies, including sinusoidal chops and triple-beam experiments. For a 
more detailed discussion of window functions, see White & Srcdnicki (1995). 

8. Likelihood Analysis of the COBE Data 

In the previous section, we discussed various issues of CMB data analysis 

from a general point of view. We will now apply what we have learned to 
a specific example, namely the COBE DMR data. We will not describe 
the COBE instrument in detail; the interested reader is referred to George 
Smoot's contribution to this volume, as well as to the papers reporting 
the four-year DMR data (Bennett et al. 1996, Gorski et al. 1996, Hinshaw 
et al. 1996, Banday et al. 1996) and references therein. We will content 
ourselves with mentioning a few of the most relevant facts. 

The COBE DMR produced all-sky maps of the microwave radiation at 
three frequencies, 31 GHz, 53 GHz, and 90 GHz, with a beam size of 7° 

^^The fact that modes with low |m| are suppressed depends on the fact that we have 
oriented our coordinate system with the chop in the azimuthal direction. In contrast, 
the statement that, on average, modes with low / are suppressed is independent of the 
orientation of the coordinate system. 



{dF) = Y,Ci\YUe,<t>)\^ {2s\n\maf ^ ^ (^) ^iWi. (84) 
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Figure 11. The likelihood function for the two-year COBE DMR data, based on a 
brute-force analysis involving the entire pixel covariance matrix. Only the Sachs- Wolfe 
contribution to the anisotropy is included. See Tegmark & Bunn (1995) for further details. 



(FWHM). The maps consist of 6144 pixels, although only about 4000 of 
them are at high enough Galactic latitude to be used for studying the CMB. 
Although the DMR is a differencing instrument, the data have been used 
to produce sky maps of AT/T, so wc do not need to worry about beam- 
switching in our analysis. We do, however, have to worry about the fact that 
the maps are insensitive to the monopole and dipole of the anisotropy. 

The noise in the COBE maps appears to be Gaussian, and different pix- 
els have noise that is approximately uncorrelated (Linewcavcr et al. 1994). 
Therefore, as long as the CMB anisotropy obeys Gaussian statistics, equa- 
tion (72) applies: 

£ = -2 In L = In ((27r)^p- det m) + d^M-^d, (85) 

where M = YCY'^ -|- N with % = af6ij. The matrix M is ~ 4000 x 
4000, which is a size that can be inverted, with sufficient patience, on a 
workstation. Tegmark & Bunn (1995) have performed such a brute-force 
analysis on the two-year COBE DMR data for a two-parameter family of 
power spectra, with results shown in Figure 11. However, if we wish to 
explore a larger parameter space, we must find a more efficient way to 
compute likelihoods. 

^^Actually, COBE is in principle perfectly sensitive to the dipole; however, the intrinsic 
CMB dipole is impossible to distinguish from the much larger dipole due to our own 
motion with respect to the CMB center-of-momentum frame. 
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8.1. DATA COMPRESSION 

All likelihood analyses of the COBE data, with the exception of the brute- 
force analysis mentioned above, have involved some form of data compres- 
sion. That is, the pixel data d has been mapped to some smaller-dimensional 
data vector, which has been used for computing likelihoods. We will focus 
on linear methods of data compression, in which the compressed data vector 
X is linear in d, 

x = Ad, (86) 

for some K x A^pix matrix A with K < A'^pix. x is a Gaussian random vector, 
so we can use equation (72) to compute the likelihood of x in terms of the 
covariance matrix 

M = (xx^) = AMA'^. (87) 

Of course, the likelihood computed in this way will not be the same as the 
true likelihood computed from d, but we can hope that, if we perform our 
data compression wisely, we will get a reasonable approximation to the true 
likelihood. 

In effect, linear data compression is equivalent to expanding the sky 
map in a set of normal modes, namely the rows of A. Each element of the 
compressed data vector x is approximately the integral of the sky map, 
multiplied by some function: heuristically, we can write 

If our pixels uniformly covered the whole sky, we would choose these mode 
functions to be the spherical harmonics by setting Afj_j = Y^(rj). Then 
would be an estimate of (up to an overall normalization). In fact, we 
would be performing precisely the analysis described in Section 7.2. 

Even though we do not actually have complete sky coverage, there is 
still nothing stopping us from choosing the rows of A to be the spherical 
harmonics. This is in fact the technique described by Gorski (1994), which 
has been applied to the DMR data by Gorski et al. (1994, 1996).^^ By cut- 
ting off the spherical harmonic expansion at I = 30, Gorski et al. compress 
the data from ~ 4000 to ~ 1000 numbers, with little loss of cosmological 
information. This is possible because the cosmic signal in the data drops 
off rapidly with increasing I (due to both the beam cutoff and the shape of 

^^Gorski's method involves the additional step of orthogonalizing the spherical harmon- 
ics with an algorithm like Gram-Schmidt. Orthogonalizing with respect to the monopole 
and dipole is an excellent way to render the data insensitive to these modes, but or- 
thogonalizing the modes with I > 2 with respect to each other has no effect on the 
likelihoods. 
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the anisotropy power spectrum), while the noise has approximately equal 
power in all modes. 



8.2. THE KARHUNEN-LOEVE TRANSFORM 

The Karhunen-Locvc transform (Karhunen 1947), which is also known as 
optimal subspace filtering or expansion in signal-to-noise eigenmodes, is 
another prescription for linear data compression. It was first introduced 
to CMB data analysis by Bond (1994, 1995, 1996), and has been used 
extensively on the COBE data (Bunn, Scott, & White 1995; Bunn 1995; 
White & Bunn 1995; Bunn k Sugiyama 1996; Bunn, Liddle, k White 1996; 
Bunn Sz White 1996) as well as in analyzing galaxy catalogues (Vogeley Sz 
Szalay 1996). 

Let us consider a one-parameter family of power spectra Ci{q), where the 
true value of q is go- We wish to choose our method of data compression (i.e., 
the matrix A) to enable us to estimate q as well as possible. Specifically, 
we choose A to maximize our ability to reject incorrect values of q. 

On average, the likelihood function L{q) has a peak at the true value 
q = qo, so {L'{qo)) = 0. The average rejection power is determined by the 
rate at which the likelihood declines when we move away from this peak. 
The figure of merit for describing rejection power is therefore 



7 



dq 



) ■ (89) 

q=go' 



The Karhunen-Loeve transform consists of choosing the compression matrix 
A to maximize 7 (for a fixed value of K, the dimension of the compressed 
data vector). 

To solve this optimization problem, we write down the likelihood in 
terms of the reduced data vector x, 

£ = if In 27r + Tr (ln( AMA"^) + (AMA^y^xx^^ . (90) 

Then we compute 7, vary a matrix element Aij, and set ^7 = 0. After some 
algebra, we find that each row da of A must satisfy an eigenvalue equation, 

M'^da = XaModa. (91) 

Here Mq is the covariance matrix M corresponding to the correct parameter 
value q = qo, and 

The rejection power 7 is simply the sum of the squares of the eigenvalues 



(92) 

q=qo 
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This completes our prescription for choosing the matrix A. We should 
choose the rows of A to be the solutions of equation (91) with the largest 
values of |Aa|. Furthermore, wc know when it is safe to stop adding new 
rows: once all of the remaining eigenvalues An are small, we will no longer 
significantly increase 7 by adding more rows to A. 

To get an intuitive understanding of the Karhunen-Loeve transform, 
consider the case where the parameter q is the normalization of the power 
spectrum, so Ci{q) = qCf'\ Then we can rewrite the eigenvalue equation 
(91) as 

da-, (93) 

where \a = Aa/(1 — Aa), and Mgignai = YCY^ and Mnoisc = N are the 
signal and noise contributions to M. We can see from equation (93) that cia 
is an eigenvector of M^^^ggMgignai • This is why Bond (1994, 1995) calls it 
an "cigenmode of the signal-to- noise ratio." In effect, the Karhunen-Loeve 
transform tells us which directions in the A^pix-dimensional pixel space are 
most sensitive to the cosmic signal, and which are dominated by noise. 

The reader, being an extraordinarily perceptive soul, is no doubt won- 
dering at this point whether this whole procedure is worth the trouble. 
After all, our original goal was to avoid having to invert an A'pix x A^pjx 
matrix. Now we find ourselves having to solve an A'pix-dimensional eigen- 
value problem, which is much harder than simply inverting a matrix. Re- 
call, however, that our objection to a brute-force likelihood analysis was 
that we didn't want to invert the large matrix M repeatedly as we varied 
the power spectrum. The Karhunen-Loeve eigenvalue problem needs to be 
solved only once, with all future operations being performed on the K- 
dimensional compressed data vector. Furthermore, it turns out that we can 
save ourselves a lot of work by solving equation (91) in spherical harmonic 
space rather than real space (Bunn 1995). Once we choose some cutoff Zmax, 
the dimension of the eigenvalue problem is reduced from A^pix to ~ /^ax- It 
turns out that none of the high signal-to-noise eigenmodes have significant 
power beyond Z = 30 or so, so we can safely choose Zmax to be 40 or 50, 
resulting in a substantial saving in computational effort. 

The Karhuncn-Locvc transform depends on a choice of power spectrum. 
Ideally, we would like to use the true power spectrum, but of course we don't 
know the true power spectrum. We must therefore choose a fiducial power 
spectrum more or less arbitrarily. In principle, this could lead to trouble: 
we might find that the choice of fiducial power spectrum had a significant 
effect on our final results. There are two ways to address this question: 
we can repeat the analysis with different fiducial power spectra, and we 

^^If we did, there would be no need to perform the analysis! 
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can perform Monte Carlo simulations to check that the likelihood analysis 
returns unbiased estimates of the parameters of interest. ^'^ 

In the case of the COBE data, extensive tests have revealed that sensi- 
tivity to the fiducial power spectrum is not a problem (Bunn 1995, Bunn 
&; White 1996). For example, the maximum-likelihood normalization of an 
n = 1 Sachs- Wolfe spectrum is (Q) = 18.73drl.25 /xK using an n = 1 fiducial 
power spectrum and {Q) = 18.74 ib 1.25 ;uK using an n = 1.5 power spec- 
trum. The maximum-likelihood value of n also does not change when we 
change the fiducial power spectrum. Furthermore, Monte Carlo simulations 
show that our estimates of {Q) and n are unbiased to an accuracy much 
better than the statistical uncertainty 0.03 //K and ^ 0.05 respectively). 
See Bunn & White (1996) for further details. 

8.3. MONOPOLE AND DIPOLE REMOVAL 

Since the COBE data do not contain useful monopole and dipole informa- 
tion, it is customary to remove a best-fit monopole and dipole from the data 
before performing any further analysis. Unfortunately, since incomplete sky 
coverage destroys the orthogonality of the spherical harmonics, this proce- 
dure covertly removes part of the contribution of the higher multipoles. 
There are two ways to compensate for this. 

The first option is to treat the monopole and dipole coefficients (aoo 
and aim) as "nuisance parameters," i.e., quantities whose true values we 
neither know nor care about. In the context of Bayesian analysis, the 
natural thing to do with nuisance parameters it to marginalize over them. 
Marginalizing over a nuisance parameter ^ means replacing the likelihood 
L with the marginal likelihood 



Here p{Q is a prior probability density for which is usually taken to be 
constant. By marginalizing over the data, we are using a standard identity 
of probability theory, 



Even in methods that do not involve a choice of fiducial power spectrum, it is wise 
to perform simulations to test for bias. Even a brute-force likelihood analysis using the 
full pixel data is not guaranteed to return unbiased parameter estimates. 

^^A parameter may be a nuisance parameter at one time and an interesting parameter 
at another. For instance, if we want to estimate the spectral index n, we should probably 
compute L{(Q),n) and treat (Q) as a nuisance parameter. At some other time, though, 
we may think (Q) is an interesting thing to know. 




(94) 




(95) 
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to remove all (^-dependence from the likelihood. 

Prom a frequentist point of view, the natural way to get rid of a nuisance 
parameter is to maximize with respect to it. That is, we replace L with 
max^ ^(0- That way, a particular model is ruled out only if it is ruled out 
for all possible values of C,. 

If we are performing some sort of data compression, then we have a 
second option for dealing with the monopole and dipole. We can simply 
impose a constraint on our compression matrix A, requiring it to be insen- 
sitive to the unwanted multipoles. This is in effect the approach of Gorski 
(1994): by orthogonalizing the spherical harmonics, he makes his compres- 
sion matrix insensitive to the monopole and dipole. This approach turns 
out to be mathematically equivalent to marginalizing over the unwanted 
modes. 

People frequently remove the quadrupole information from the COBE 

data in the same way as the monopole and dipole, on the grounds that the 
quadrupole is particularly susceptible to Galactic contamination. It has also 
been known since the earliest days of COBE analysis that the quadrupole 
is anomalously low (compared to the prediction of a fiat power spectrum 
normalized to the other multipoles). Prom a statistical point of view, this 
is a delicate situation: it is perfectly acceptable, and even wise, to throw 
away data if there is a reasonable fear of contamination, but throwing 
away data that is known a priori to be discordant with favored theories is 
a major statistical faux pas. On balance, it is probably better to leave the 
quadrupole information in in the interest of avoiding even the possibility of 
biased editing of the data. 

There is another argument in favor of retaining the quadrupole. Even if 
the quadrupole is contaminated, it still contains useful information, and so 
it may be unwise to throw it away entirely. Since the quadrupole is a root- 
mean-square quantity, any contaminant would tend to bias the quadrupole 
up. In fact, if a particular theory is ruled out because it predicts too large a 
quadrupole, hypothesizing an additional quadrupolar contaminant cannot 
save that theory: as long as the contaminant is statistically independent of 
the cosmic signal, the net result of hypothesizing a contaminant is neces- 
sarily to lower the likelihood of that theory. 

8.4. RESULTS 

The main purpose of this section is to discuss data analysis techniques, not 
results; however, we will briefly present some results based on a Karhunen- 
Loeve analysis of the four-year COBE DMR data. The reader is referred to 
Bunn &; White (1996) for a more detailed discussion. 

The data set used for this analysis consists of a weighted average of the 
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Figure 12. The points show the eigenvalues Aa of the four-year COBE data, sorted in 
decreasing order. The sohd curve is the running sum of A^, normaUzed to 1. 

53 and 90 GHz maps from the four-year DMR data. The maps are averaged 
with weights inversely proportional to the noise variance, in order to min- 
imize noise in the average map. (This is equivalent to performing a joint 
likelihood analysis of the individual maps.) We performed the Karhuncn- 
Loeve analysis using a flat fiducial power spectrum 1(1 + l)Ci = const., and 
we retained the 500 most significant modes. 

Figure 12 shows the eigenvalues Aa, together with a running sum of 
the squares of the eigenvalues. (Recall that this sum is proportional to the 
rejection power 7.) This plot indicates that modes beyond the first 500 do 
not significantly increase our ability to discriminate among models. 

Figure 13 shows the likelihood function for low-density CDM mod- 
els, both with and without a cosmological constant. Figure 14 shows the 
maximum-likelihood power spectrum, found by allowing each Ci with 2 < 
Z < 19 to vary independently. The error bars shown in this figure are stan- 
dard errors determined by approximating the likelihood near the peak as 
a Gaussian. The standard errors arc then the square roots of the diagonal 
elements of the covariance matrix of this Gaussian. Error bars determined 
in this way should be viewed with extreme caution. First, the likelihood is 
not very well approximated by a Gaussian: on the contrary, it is strongly 
skew-positive at low /. Second, these standard errors contain no informa- 
tion about correlations between the errors. These correlations are largest 
for pairs of modes whose Z- values differ by 2. (Coupling between modes with 
AZ = 1 is weak because the data have approximate reflection symmetry.) 
The deceptively small error bar on the estimate of C2 is largely due to the 
failure of the Gaussian approximation for the likelihood, although the 15% 
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Figure 13. Likelihood as a function of Qo for CDM models with zero cosmological 
constant (left) and zero spatial curvature (right). The spectral index n increases from 0.8 
to 1.2 from top to bottom. The likelihoods are normalized so that a flat spectrum has 
L = l. See Bunn & White (1996) for further details. 



anticorrelation between C2 and C4 also plays a role. 

Finally, Table 1 shows values of the small-scale fluctuation amplitude 
£78 for various theoretical models. The observational constraint is approxi- 
mately 0.5 < (T8 ^ 0.8 {e.g., Viana & Liddle 1996). 

8.5. WIENER FILTERING 

Until now, we have focused on attempts to estimate the angular power 
spectrum C/. While this is the most useful thing to do with a CMB data 
set, other complementary approaches can be interesting in certain contexts. 
For instance, we can assume that we know the angular power spectrum and 
try to determine the underlying cosmic signal from a noisy sky map. That 
is, we can attempt to filter a sky map, cleaning up the noise and leaving 
the signal. The Wiener filter (Wiener 1949) is optimal linear filter for this 
purpose, in the sense of least squares. The recent use of Wiener filtering 
in astrophysics is largely due to Rybicki & Press (1992), and the filter has 
been applied to the COBE data by Bunn, Hoffman, & Silk (1996). 

Suppose we have a data vector d containing signal and noise. We want 
to apply a linear filter F so that y = Fd approximates the true cosmic 
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no 




nHDM 


n 


h 




ITS 


standard CDM 


1.0 


0.0 


0.0 


1.0 


0.50 


0.0125 


1.22 


tilted CDM 


1.0 


0.0 


0.0 


0.8 


0.50 


0.0250 


0.72 


MDM 


1.0 


0.0 


0.2 


1.0 


0.50 


0.0150 


0.79 


ACDM 


0.4 


0.6 


0.0 


1.0 


0.65 


0.0150 


1.07 


Open CDM 


0.4 


0.0 


0.0 


1.0 


0.65 


0.0150 


0.64 


Low h CDM 


1.0 


0.0 


0.0 


1.0 


0.35 


0.0150 


0.74 



TABLE 1. The predicted fluctuation amplitude on scales of 
8 h~^M.-pc for various CDM- like models. MDM is a "mixed dark mat- 
ter" model. All normalizations are from the four-year COBE DMR 
data. See Bunn & White (1996) for further details. 



signal AT/T in such a way that the mean-square deviation, 

{yi-^{ri))'y (96) 

is as small as possible. The solution to this optimization problem is the 
Wiener filter, 

F = MsignalM-\ (97) 

where M is as usual the data covariance matrix and Mgignai is the signal 
contribution to M. 

Under the assumption of Gaussian statistics, the Wiener- filtered data is 
also the maximum-likelihood estimator of AT/T at each point. Note that 
in regions of very high noise, where we have little information, the Wiener 
filter returns values near zero, because this is the most likely a priori value 
of a zero-mean Gaussian. 

Figure 15 shows a Wiener-filtered COBE sky map. Although the signal- 
to-noise ratio in the raw pixel maps is typically less than one per pixel, the 
largest-amplitude features in the filtered map are significant at the five 
sigma level per pixel. 

One of the main uses of the filtered maps is in making predictions for 
other experiments. Assuming Gaussian statistics, the full error covariance 
matrix of the Wiener-filtered map is known, and so we can produce maps 
with known uncertainties of a region of the sky. For predictions of the CMB 
sky as it should be seen by the Tenerife experiment, see Bunn, Hoffman, k, 
Silk (1996). 
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Figure 14- The points represent the maximum-Ukelihood power spectrum, obtained 
by letting all Cis between 2 and 19 vary freely. A flat [Q) = 19 /iK power spectrum is 
plotted for comparison. The error bars are standard errors determined by approximating 
the likelihood by a Gaussian near the peak. Because the Gaussian approximation is poor, 
and because there arc significant correlations between the errors, these error bars can be 
deceptive. The small formal error on C2 is particularly misleading. See Bunn & White 
(1996) for further discussion. 

9. Summary 

The main lesson to be learned from this entire institute is that this is an 
exciting time in CMB research. The existing data are already telling us 
vast amounts about cosmology, and in the next few years the data should 
continue to improve dramatically. The high quality of present and future 
anisotropy observations presents us with some challenges. We must under- 
stand our theoretical models well enough to make accurate predictions, and 
we must develop statistical tools that enable us to determine which predic- 
tions are consistent with the data. Both of these challenges are currently 
being met with ever-increasing success. 

The tools for making accurate predictions, at least in linear models 
like CDM, arc by now quite well developed. Furthermore, in recent years 
analytic and scmianalytic approximations have dramatically improved our 
understanding of the basic physical principles involved in anisotropy for- 
mation. 

The problem of data analysis is also much better understood today than 
it was five years ago (before there were any actual detections to analyze). 
However, it is important to remember that analysis of future data sets will 
present challenges that make the COBE analysis look easy. When sky maps 
contain a million pixels instead of a few thousand, data compression will be 
absolutely essential. It is already time to start thinking about this difficult 
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Figure 15. A sky map, in Aitoff projection, of the Wiener-filtered four-year DMR data. 
The relative lack of structure near the Galactic plane is due to the fact that no data from 
that region were used. 

problem. 

In addition, future experiments with higher resohition than COBE will 
be more susceptible to foreground contamination. In the case of COBE, 
it is believed that simply excising points too close to the Galactic plane 
is sufficient to remove most of the foreground contamination; for future 
high-sensitivity degree-scale experiments, more sophisticated methods will 
be necessary. 

We have seen that CDM-like theoretical models predict that vast am- 
ounts of information arc encoded in the CMB anisotropy power spectrum. 
There is a very real hope that the CMB will give us accurate values for all 
sorts of cosmological parameters. But even if the information is there, we 
will have to do a lot of work to wrest it from the data. 
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